Goto

Collaborating Authors

 two-step approach


Optimizing affinity-based binary hashing using auxiliary coordinates

Neural Information Processing Systems

In supervised binary hashing, one wants to learn a function that maps a highdimensional feature vector to a vector of binary codes, for application to fast image retrieval. This typically results in a difficult optimization problem, nonconvex and nonsmooth, because of the discrete variables involved. Much work has simply relaxed the problem during training, solving a continuous optimization, and truncating the codes a posteriori. This gives reasonable results but is quite suboptimal. Recent work has tried to optimize the objective directly over the binary codes and achieved better results, but the hash function was still learned a posteriori, which remains suboptimal. We propose a general framework for learning hash functions using affinity-based loss functions that uses auxiliary coordinates. This closes the loop and optimizes jointly over the hash functions and the binary codes so that they gradually match each other. The resulting algorithm can be seen as an iterated version of the procedure of optimizing first over the codes and then learning the hash function. Compared to this, our optimization is guaranteed to obtain better hash functions while being not much slower, as demonstrated experimentally in various supervised datasets.


Review for NeurIPS paper: A new inference approach for training shallow and deep generalized linear models of noisy interacting neurons

Neural Information Processing Systems

Additional Feedback: - The authors claim that empirically they do not need large amounts of repeated stimuli for the method to work. This empirical claim is based on only a single experimental dataset. It would be nice to see some theoretical analysis or exploration into how much data is needed for this to work -- presumably if my data has only 2 repeats of a stimulus then the h_stim auxilliary variable could be very poorly estimated. This introduces a bias into the results of the model, but how bad is this bias? Is this correction procedure provably optimal in some way?


Exploring Large Language Models for Product Attribute Value Identification

arXiv.org Artificial Intelligence

Product attribute value identification (PAVI) involves automatically identifying attributes and their values from product information, enabling features like product search, recommendation, and comparison. Existing methods primarily rely on fine-tuning pre-trained language models, such as BART and T5, which require extensive task-specific training data and struggle to generalize to new attributes. This paper explores large language models (LLMs), such as LLaMA and Mistral, as data-efficient and robust alternatives for PAVI. We propose various strategies: comparing one-step and two-step prompt-based approaches in zero-shot settings and utilizing parametric and non-parametric knowledge through in-context learning examples. We also introduce a dense demonstration retriever based on a pre-trained T5 model and perform instruction fine-tuning to explicitly train LLMs on task-specific instructions. Extensive experiments on two product benchmarks show that our two-step approach significantly improves performance in zero-shot settings, and instruction fine-tuning further boosts performance when using training data, demonstrating the practical benefits of using LLMs for PAVI.


Multilingual De-Duplication Strategies: Applying scalable similarity search with monolingual & multilingual embedding models

arXiv.org Artificial Intelligence

This paper addresses the deduplication of multilingual textual data using advanced NLP tools. We compare a two-step method involving translation to English followed by embedding with mpnet, and a multilingual embedding model (distiluse). The two-step approach achieved a higher F1 score (82% vs. 60%), particularly with less widely used languages, which can be increased up to 89% by leveraging expert rules based on domain knowledge. We also highlight limitations related to token length constraints and computational efficiency. Our methodology suggests improvements for future multilingual deduplication tasks.


Learning and Optimization for Price-based Demand Response of Electric Vehicle Charging

arXiv.org Artificial Intelligence

In the context of charging electric vehicles (EVs), the price-based demand response (PBDR) is becoming increasingly significant for charging load management. Such response usually encourages cost-sensitive customers to adjust their energy demand in response to changes in price for financial incentives. Thus, to model and optimize EV charging, it is important for charging station operator to model the PBDR patterns of EV customers by precisely predicting charging demands given price signals. Then the operator refers to these demands to optimize charging station power allocation policy. The standard pipeline involves offline fitting of a PBDR function based on historical EV charging records, followed by applying estimated EV demands in downstream charging station operation optimization. In this work, we propose a new decision-focused end-to-end framework for PBDR modeling that combines prediction errors and downstream optimization cost errors in the model learning stage. We evaluate the effectiveness of our method on a simulation of charging station operation with synthetic PBDR patterns of EV customers, and experimental results demonstrate that this framework can provide a more reliable prediction model for the ultimate optimization process, leading to more effective optimization solutions in terms of cost savings and charging station operation objectives with only a few training samples.


Two-step Automated Cybercrime Coded Word Detection using Multi-level Representation Learning

arXiv.org Artificial Intelligence

In social network service platforms, crime suspects are likely to use cybercrime coded words for communication by adding criminal meanings to existing words or replacing them with similar words. For instance, the word 'ice' is often used to mean methamphetamine in drug crimes. To analyze the nature of cybercrime and the behavior of criminals, quickly detecting such words and further understanding their meaning are critical. In the automated cybercrime coded word detection problem, it is difficult to collect a sufficient amount of training data for supervised learning and to directly apply language models that utilize context information to better understand natural language. To overcome these limitations, we propose a new two-step approach, in which a mean latent vector is constructed for each cybercrime through one of five different AutoEncoder models in the first step, and cybercrime coded words are detected based on multi-level latent representations in the second step. Moreover, to deeply understand cybercrime coded words detected through the two-step approach, we propose three novel methods: (1) Detection of new words recently coined, (2) Detection of words frequently appeared in both drug and sex crimes, and (3) Automatic generation of word taxonomy. According to our experimental results, among various AutoEncoder models, the stacked AutoEncoder model shows the best performance. Additionally, the F1-score of the two-step approach is 0.991, which is higher than 0.987 and 0.903 of the existing dark-GloVe and dark-BERT models. By analyzing the experimental results of the three proposed methods, we can gain a deeper understanding of drug and sex crimes.


Optimizing Affinity-Based Binary Hashing Using Auxiliary Coordinates

Neural Information Processing Systems

In supervised binary hashing, one wants to learn a function that maps a highdimensional feature vector to a vector of binary codes, for application to fast image retrieval. This typically results in a difficult optimization problem, nonconvex and nonsmooth, because of the discrete variables involved. Much work has simply relaxed the problem during training, solving a continuous optimization, and truncating the codes a posteriori. This gives reasonable results but is quite suboptimal. Recent work has tried to optimize the objective directly over the binary codes and achieved better results, but the hash function was still learned a posteriori, which remains suboptimal. We propose a general framework for learning hash functions using affinity-based loss functions that uses auxiliary coordinates. This closes the loop and optimizes jointly over the hash functions and the binary codes so that they gradually match each other. The resulting algorithm can be seen as an iterated version of the procedure of optimizing first over the codes and then learning the hash function. Compared to this, our optimization is guaranteed to obtain better hash functions while being not much slower, as demonstrated experimentally in various supervised datasets.


Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction

arXiv.org Machine Learning

Quantifying a model's predictive uncertainty is essential for safety-critical applications such as autonomous driving. We consider quantifying such uncertainty for multi-object detection. In particular, we leverage conformal prediction to obtain uncertainty intervals with guaranteed coverage for object bounding boxes. One challenge in doing so is that bounding box predictions are conditioned on the object's class label. Thus, we develop a novel two-step conformal approach that propagates uncertainty in predicted class labels into the uncertainty intervals for the bounding boxes. This broadens the validity of our conformal coverage guarantees to include incorrectly classified objects, ensuring their usefulness when maximal safety assurances are required. Moreover, we investigate novel ensemble and quantile regression formulations to ensure the bounding box intervals are adaptive to object size, leading to a more balanced coverage across sizes. Validating our two-step approach on real-world datasets for 2D bounding box localization, we find that desired coverage levels are satisfied with actionably tight predictive uncertainty intervals.


Discrete-time Competing-Risks Regression with or without Penalization

arXiv.org Machine Learning

Many studies employ the analysis of time-to-event data that incorporates competing risks and right censoring. Most methods and software packages are geared towards analyzing data that comes from a continuous failure time distribution. However, failure-time data may sometimes be discrete either because time is inherently discrete or due to imprecise measurement. This paper introduces a novel estimation procedure for discrete-time survival analysis with competing events. The proposed approach offers two key advantages over existing procedures: first, it expedites the estimation process for a large number of unique failure time points; second, it allows for straightforward integration and application of widely used regularized regression and screening methods. We illustrate the benefits of our proposed approach by conducting a comprehensive simulation study. Additionally, we showcase the utility of our procedure by estimating a survival model for the length of stay of patients hospitalized in the intensive care unit, considering three competing events: discharge to home, transfer to another medical facility, and in-hospital death.


90% F1 Score in Relational Triple Extraction: Is it Real ?

arXiv.org Artificial Intelligence

Extracting relational triples from text is a crucial task for constructing knowledge bases. Recent advancements in joint entity and relation extraction models have demonstrated remarkable F1 scores ($\ge 90\%$) in accurately extracting relational triples from free text. However, these models have been evaluated under restrictive experimental settings and unrealistic datasets. They overlook sentences with zero triples (zero-cardinality), thereby simplifying the task. In this paper, we present a benchmark study of state-of-the-art joint entity and relation extraction models under a more realistic setting. We include sentences that lack any triples in our experiments, providing a comprehensive evaluation. Our findings reveal a significant decline (approximately 10-15\% in one dataset and 6-14\% in another dataset) in the models' F1 scores within this realistic experimental setup. Furthermore, we propose a two-step modeling approach that utilizes a simple BERT-based classifier. This approach leads to overall performance improvement in these models within the realistic experimental setting.